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Deep learning models on the same database have varied accuracy ratings; as 
such, additional parameters, such as pre-processing, data augmentation and 
transfer learning, can influence the models’ capacity to obtain higher 
accuracy. In this paper, a fully automated model is designed using deep 
learning algorithm to capture images from patients and pre-process, segment 
and classify the intensity of cancer spread. In the first pre-processing step, 
pectoral muscles are removed from the input images, which are then 
downsized. The removal of pectoral muscles after identification may become 
crucial in classification systems. Finally, the pectoral musclesaredeleted 
from the picture by using an area expanding segmentation. All 
mammograms are downsized to reduce processing time. Each stage of the 
fully automated model uses an optimisation approach to obtain high- 
accuracy results at respective stages. Simulation is conducted to test the 
efficacy of the model against state-of-art models, and the proposed fully 
automated model is thoroughly investigated. For a more accurate 
comparison, we include the model in our analysis. In a nutshell, this work 
offers a wealth of information as well as review and discussion of the 
experimental conditions used by studies on classifying breast cancer images. 
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1. INTRODUCTION 


Breast cancer is a major cause of death globally and isgenerally caused by the abnormal behaviour 
of T-cells grown in breasts. These cells may also proliferate in regions, where they are not typically seen in 
the human body, andthis phenomenon is clinically referred to as metastasis. Mammography is the best option 
for detectingbreast cancer before it spreads. Radiologists’ experience determines the results of 


mammographic images, resulting in many false positives [1], [2]. 


Astley et al. [3] reported that subjective breast density evaluation is more accurate than automatic 
and semi-automated approaches in predicting the risk of breast cancer. Whether mammographic density 


Journal homepage: http://ijeecs.iaescore.com 


184 o ISSN: 2502-4752 


causes more aggressive breast cancers has been a question. The effect of mammographic density onprognosis 
should be investigated. Breast density is one of the metrics used to measure the density or number of fibro- 
glandular tissues visible on mammograms [4]. 

The thick tissue found in the breast is of a non-fatty type and has limited effect on increasing the 
risk ofbreast cancer; however, it can cause difficultyindetecting abnormalities and increasingcancer risk. 
Breasts with high-density tissues arehighly likely to acquire cancer than the ones with reduced tissue 
density [5]. Breast density estimation and categorisation can be conducted using various computer 
approaches [6]-[11]. After removing the pectoral muscles from mammograms, researchers have presented 
methods for segmenting the dense breast region and dividing it by the total breast area [12], [13]. 

Images of breast density are segmented using various approaches, such as thresholding [14], region 
growth [15], clustering [16] and texture statistical variation [17], for classification and estimation. However, 
the poor noise ratio and the variety of densities in texture and appearance cause difficulty in segmentation 
and classification ofbreast density. Convolutional neural networks (CNN) have made significant progress 
particularly in the classification and identification of patterns in an image. In addition, DL has numerous 
advantages over other machine learning techniques. In literature [18]-[20], various approaches 
forestimatingbreast density have been reported. The steps for classification of breast cancer are as follows. 

Initially, pectoral muscles in input photographs are eliminated as part of the initial preprocessing 
step, and the images are shrunk. The removal of pectoral muscles after identification may become critical in 
classification systems. Finally, the pectoral musclesareremoved from the image by using area expanding 
segmentation. All mammograms areshrunk to reduce the processing time. 

In this paper, a fully automated model is designed using deep learning algorithmtocapture images 
from patients andpre-process, segment and classifythe intensity of cancer spread. Each stage of the fully 
automated model uses an optimisation approach to obtain high-accuracy results at respective stages. The 
simulation is conducted to test the efficacy of the model against state-of-art models.The proposed fully 
automated model has higher accuracy than other methods. 

The remainder of this paper is organised as follows. Section 2 reviews works related. Section 3 we 
evaluate the results and discussion. Section 4 comparison between proposed and different deep learning pre- 
trained models. Section 5 presents the conclusion. 


2. RELATED WORKS 

The ideas of [1] were kept alive by Cumulus software, which has increased the resource and 
technology for finding the reasons or causes of breast cancer. The Cumulus programme [2] is a smart way to 
understand the risks of breast cancer; the program conductestimations based on the threshold level used 
tosegment the tissue (of thick ones). In this research, the area of thick breast region isclassified into six 
percentage categories. This strategy is less accurate and has significant drawbacks because it often lacks the 
precision needed for accurate segmentation. However, relying on thresholding may be less accurate. 

Breast density can be classified using imaging systemsand reported by conformingtopreviously 
employed standards [3]. Automated procedures, including radiodensity assessment [4], area available freely. 
LIBRA considers 86 variables, including global characteristics, such as patient age and X-ray breast 
thickness, as well as parameters, such as disconnected areas and Z-score mean. This software analyses areas 
in breast regions by using mammography images [4]; it further helps to estimate percentage density and 
dense tissue area. This old, handcrafted approach hasan accuracy of 0.81 in estimatingbreast density but 
istime-consuming and difficult to use. 

Researchers have devised a new algorithm that can accurately estimate the percentage of patients 
with Parkinson’s disease (PD) based on their BI-RADS density ratings. The CC-MLO-averaged accuracy of 
the algorithm is 0.98, which is higher than the accuracy of LIBRA. Volumetric breast density can be 
quantified using volume-based approaches [5]. For each pixel in a mammography image, Quantra analyses 
fibroglandular tissue thickness and X-ray attenuation to calculate the tissue volume in regions around the 
breast. The number of tissuesin each pixel, including dense and non-dense areas,is also considered. 

Fully automated approacheshave been used for quantitative assessment of breast density. 
Volumetric estimate is used to calculate the density of breast tissues [6]-[8].The use of deep learning methods 
to investigate breast density has opened a new avenue in machine learning. Deep learning approaches have 
been applied to obtain greater results in extracting features from mammograms. Computer networks, such as 
CNN, help in classification via features based on pre-processing raw images and properly depicting things at 
varied scales and orientations due to their deep learning capabilities. CNN is a popular method among other 
deep learning models. 

Unsupervised deep learning methods, such as those developed by Kallenberg et al. [9], can be used 
to acquire information from features, including fatty and dense tissues. CNN has been utilised to perform 
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unsupervised feature learning on breast density areas in mammograms by using unlabelled imaging data. 
Sub-images based on dense or fatty regions are created from the input mammogram. 

A previous study [10] that used a state-of-art deep learning model for tissue segmentation in breast 
reported an average accuracy of 0.9%. The FCNN developed in [11] was used to automatically segment thick 
fibro-glandular regions on mammograms [12], [13]. A total of 455 mammography images containing 58 
instances wereused in the evaluation. ImageNet-trained VGG16 was fine-tuned for breast density estimation 
and segmentation. 

For breast density classification, deep learning utilizes the structure of CNN and the usual BI-RADS 
categorisation. Leila et al. [14] presented two classifiers for categorizing breast density, and one of which is 
based on the CNN-AlexNet model. When low-quality images were excluded, the classification accuracy was 
increased to 98%. The study of Leila et al. [15] divided the dense and fatty regions of the breast into two 
distinct areas. Three convolution layers were employed in a deep CNN that included six phases. The first 
three phases were used to generate features, while the second three stages were used to forecast the chance of 
occurrence. 

For CNN training, [16] used an approach to classify features based on a patch-wise supervised 
methodology on mammography images. The raw DNN output was 0.80, and the post processed output of the 
deep neural network was 0.81. Many breast density classification methods have been developed, but very few 
literatures reached an accuracy greater than 90%; is considered a sophisticated model than the conventional 
methods. Breast cancer is the greatest cause of mortality among malignancies predicted in women. Despite 
several efforts to resolve this issue, a definitive answer has yet to be developed. The research gaps or short 
comings of prior studies are outlined as follows: 

- Accurate detection of breast cancer by using automated approaches remains a key scientific problem. 

- This problem is exacerbated by the fact that practically all accessible datasets are imbalanced, meaning that 
the number of instances in one class substantially out numbers those in all other classes. 

By following the processes or approaches mentioned below, these research gaps can be addressed, or the 

recognised restrictions can be solved: 

- The SMOTE approach, which identifies pictures of breast tumours to improve performance, solves the 
dataset imbalance problem. 

- Histopathologists are well-versed in labelling lesion reports and histopathology photos. Deep learning is 
utilised to address the weak generalisation capacity and over-parameterised networks, which result in 
overfitting. 


3. PROPOSED METHOD 

This section explains the methodology employed in the study. As depicted in Figure 1, the entire 
research approach involves the classification of tissues in a mammographic breastimage that includes breast 
tissue, pectoral musclesand background regions. Mammograms are used in the first stage to remove pectoral 
muscles. Following this step, the input is rescaled to include 512x512 pixels of mammograms, which include 
varyingbreast density levels. Finally, a binary mask comprising dense tissue is created by processing the 
mammograms. Two methods are usedto classify breast densities: first, the output of binary mask is fed as 
input to the multi-class classifier, namely, CNN to classify the density of breast tissues into four segments, 
which serve a greater purpose on tissue percentagedensity. Step-by-step instructions for preparing the dataset 
for the proposed model are outlined in the following section. 


INbreast Dataset 


Data-preprocessing 


Feature extraction 


Classification 


Predicted outcomes 


in 


Figure 1. Proposed architecture 
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3.1. Preprocessing 

The pectoral muscles are cleared from the input images,which are resized as an initial part of 
preprocessing. Processing techniques may falsely detect dense tissue areas on mammographic images due to 
overlap and high intensity appearing between the glandular tissue and pectoral muscles. Figure 2 the breast 
area and pectoral muscle are separated from the backdrop first, and then the mammography orientation is 
established. Finally, the pectoral muscle is removed from the picture using region growing segmentation. 
Figure 2(a) shows an example of eliminating the pectoral muscles. In order to save computing time, all 
mammograms were resized from (2,560x3,328) or (3,328x4,084) pixels to (512x512) pixels (i.e., the 
resolution yields the best accuracy for the segmentation stage) Figure 2(b). 


(b) 


Figure 2. The breast area and pectoral muscle are separated from the backdrop first, and then the 
mammography orientation is established, (a) original images and (b) pectoral muscle removal 


3.2. Breast density classification 
The CNN model is used to classify breast density; in this model, each multi-class modelling is 
allowed to find four classes outlined below: 


3.2.1. Percentage estimation on breast density 
The dense area of the tissue in a mammography image is considered the percentage of total surface 
area in the breast region, also known as percentage density. We discuss five stages in the traditional method: 
- First, the mask images are resized to the same resolution as the input mammograms. 
- The study uses non-zero pixels in an input mammography image to measure the breast area. 
- To indicate the amount of thick tissue in an area, we count the number of pixels that are not zero. 
- The ratio of the dense tissue area to the breast area is computed to estimate the density of the tissue area. 
- The density of breast is finally classified using a multi-class classifier into four categories based on the 
thresholding procedures. 


3.2.2. Breast density classification using DL 

Breast density is difficult to categorize using most approaches because of their computational 
complexity. Three convolution layers havekernel sizes of 4x4, 5x5 and 9x9 with fully connected layers. 
The study uses the maximum pooling layer with a stride of 4x4 that includes two convolution layers. 
A flattening step is conducted before the output of the last convolution layer is sent into the first fully 
convolutional layer with 128 neurons. ReLU is used as an activation function for the four layers. 

In the first FC layer, a 0.5 dropout is used to reduce the overfitting process. When all four neurons in 
the final FC layer are used, the soft-max function is used to determine the membership degree for a class in a 
binary mask input. An unbalanced dataset can be avoided using a cross-entropy loss. It is one less than the 
ratio of samples per class to the overall sample count. 

Momentum of 0.9, learning rate of 0.001 and batch size of 16 are used to optimise the model with 
RMProp. Five layers of the network are randomly seeded with weights before the network is trained. During 
training, total layers, optimum architecture, filters and neurons are discovered. Architecture of the CNN 
model used in this study is shown in Figure 3. The model was slightly modified from the concept of Softmax 
to receive input images with a dimension. 


Indonesian J Elec Eng & Comp Sci, Vol. 28, No. 1, October 2022: 183-191 


Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 o 187 


Generated Binary Mask 
(128x128x1) 


Max Pooling 4x4 (128x128x64) 


Max Pooling 4X4 (32x32Xx128) 


Max Pooling 4x4 (3232128 


Max Pooling 4x4 (8x8x128) 


Fully Connected Layer (1X1X256) 


Softmax 


Class Labels (Output) 


Figure 3. Proposed CNN architecture 


4. RESULTS AND DISCUSSION 

In this section, the programming in Python 3.5 with the PyTorch library on 64-bit Ubuntu is used to 
implement the proposed approach. The CPU is a 3.4GHz Intel Core-i7 CPU, the GPU isan NVIDIA 1070 
with 16 GB RAM and the OS is 64-bit Ubuntu. The dataset used in the study is the INbreast dataset [21]. It is 
a 2D database that comprises MLO and CC 410 mammographic images that are publicly available. 

INbreast breast density categorization is based on the reporting and data technique standard for 
breast imaging. The 3,328x4,084 pixels are the image size of a mammography. The binary masks in the 
ground truth is used to segment breast density missing from the INbreast dataset. As a result, the images are 
annotated bybreast cancer radiologists. 

Among 115 patients, 80% of the images from the dataset areused for training and the remaining 
20% areused as test datasets. A cross-validation is conducted on the test dataset including 82 patients (set 1-6 
images, set 2-20 images, set 3—27 images, set 2-29 images) to train the CNN classifier, and the remaining 
images areused to validate the network. The CNN-based classification approach performs better when 
utilizing a balanced dataset than an imbalanced dataset [21]. CNN achieves the lowest overall accuracy of 
90.29% with a size of 128x128 from the unbalanced dataset. After applying augmentation to varied input 
image sizes, the classification rates are 98.75% and 98.62%. However, the CNN-based classification approach 
for the image with size of 128x128 on the balanced dataset shows an overall accuracy. Figure 4 of 98.75%, 
which is improved by 0.13% compared with existing classifiers [22]. 

Experiments on the two datasets are conducted for the classification of breast density using trained 
CNN: one for a balanced dataset and one for an imbalanced dataset [23]. The 64x64 and 128x128 images are 
used to test the network [24], [25]. The generated binary image from the segmentation is given as an input to 
the CNN-based approach. The outputs (Figures 5-8) of this network are employed to classify the density of 
breast tissues in mammograms. 

According to the results of the CNN-based classification approach, the data augmentation and the 
construction of balanced datasets from the overall unbalanced datasets increase the accuracy of classification. 
The use of these classification systems helps to maximize the accuracy and minimize the complexity. A 
strong correlation exists between the proposed approach and the radiologist manual for categorization, which 
is comparable with the correlation coefficients reported in literature. 
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Figure 8. Percentage error 


According to results from the CNN-based classification approach, the use of data augmentation and 
the construction of balanced datasets from the overall unbalanced datasets for increasing the accuracy of 
classification. The study further helps in maximizing accuracy and minimizing complexity can be attained 
with these classification systems. There is a strong correlation between proposed approaches in this study and 
radiologist manual categorization, which is comparable to the correlation coefficients reported in the 
literature. 
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5. COMPARISON BETWEEN PROPOSED AND DIFFERENT DEEP LEARNING PRE-TRAINED 
MODELS 
We also discussed comparative findings of experiments done here on the INbreast Dataset to train 
accuracy and loss evaluation through this whole section, and comparative experimental results achieved 
using the INbreast dataset to validate loss and accuracy analysis for the VGG19, ResNet50, and CNN 
models. Table 1 demonstrates comparison results of performance metrics for deep learning models like- 
existing VGG19 and ResNet50 pre-trained model with the proposed CNN model. 


Table 1. Comparison of performance 


Model Training Training Validation Validation 
loss acc loss acc 
VGG19 [23] 0.2199 0.9113 0.3675 0.8544 
ResNet50 [22] 0.2722 0.8871 0.3703 0.8443 
(Proposed) 0.1314 0.9489 0.3374 0.8832 


6. CONCLUSION 

In this paper, a fully automated model is designed that captures the images from patients, pre- 
process, segments and classifies the intensity of cancer spread using deep learning algorithm. The simulation 
is conducted to test the efficacy of the model against state-of-art models and the results of simulation show 
that the proposed fully automated model obtains increased accuracy than other methods. A total of 410 
images from the dataset are used in evaluation. This technique was able to obtain a 98% success rate in the 
classification of breast density. In the future, we will examine deep learning with complex architectures, such 
as several layers and neurons, and test them on large groups of participants. The most exciting future 
initiatives, on the other hand, are learning and identifying visual signals associated to histological design and 
morphology in order to get quantitative semantic parameters such as the fraction of dead tissue or neoplasm 
in WSI. 
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